Ideal bootstrap estimation of expected prediction error for k-nearest neighbor classifiers: Applications for classification and error assessment
نویسندگان
چکیده
Euclidean distance -nearest neighbor ( -NN) classifiers are simple nonparametric classification rules. 5 5 Bootstrap methods, widely used for estimating the expected prediction error of classification rules, are motivated by the objective of calculating the ideal bootstrap estimate of expected prediction error. In practice, bootstrap methods use Monte Carlo resampling to estimate the ideal bootstrap estimate because exact calculation is generally intractable. In this article, we present analytic formulae for exact calculation of the ideal bootstrap estimate of expected prediction error for -NN classifiers and 5 propose a new weighted -NN classifier based on resampling ideas. The resampling-weighted -NN 5 5 classifier replaces the -NN posterior probability estimates by their expectations under resampling and 5 predicts an unclassified covariate to belong to the group with the largest resampling expectation. A simulation study and an application involving remotely sensed data show that the resampling-weighted 5 5 -NN classifier compares favorably to unweighted and distance-weighted -NN classifiers.
منابع مشابه
BIOINFORMATICS Prediction Error Estimation: A Comparison of Resampling Methods
Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection, and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the ’...
متن کاملDrought Monitoring and Prediction using K-Nearest Neighbor Algorithm
Drought is a climate phenomenon which might occur in any climate condition and all regions on the earth. Effective drought management depends on the application of appropriate drought indices. Drought indices are variables which are used to detect and characterize drought conditions. In this study, it was tried to predict drought occurrence, based on the standard precipitation index (SPI), usin...
متن کاملPrediction error estimation: a comparison of resampling methods
MOTIVATION In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'tr...
متن کاملمقایسه عملکرد مدل کاکس و روش K ـ نزدیکترین همسایگی در تخمین بقای بیماران پیوند کلیه
Introduction & Objective: Cox model is a common method to estimate survival and validity of the results is dependent on the proportional hazards assumption. K- Nearest neighbor is a nonparametric method for survival probability in heterogeneous communities. The purpose of this study was to compare the performance of k- nearest neighbor method (K-NN) with Cox model. Materials & Methods: This ...
متن کاملIdentification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor
Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems. In this study, we d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistics and Computing
دوره 10 شماره
صفحات -
تاریخ انتشار 2000